Abstract: Intrusion detection is a continuous process and depending on the size of the network and the number of transmissions being carried out in the network, the number of packets to be analyzed varies considerably. Hence there is no specific or defined data size associated with it, but the Velocity component of Big Data plays a vital role here. The packets being transferred tends to be fast, hence a mechanism to provide analysis in real time becomes mandatory. This paper presents a technique to predict intrusions faster and with higher accuracy. It uses a Random Forest based classifier implemented on Hadoop platform using Spark. Spark, being a stream processing framework exhibits effective results in real-time.

Keywords: Intrusion Detection; Networks; Hadoop; Spark; Random Forest.